The choanoflagellate pore-forming lectin SaroL-1 punches holes in cancer cells by targeting the tumor-related glycosphingolipid Gb3

Choanoflagellates are primitive protozoa used as models for animal evolution. They express a large variety of multi-domain proteins contributing to adhesion and cell communication, thereby providing a rich repertoire of molecules for biotechnology. Adhesion often involves proteins adopting a β-trefoil fold with carbohydrate-binding properties therefore classified as lectins. Sequence database screening with a dedicated method resulted in TrefLec, a database of 44714 β-trefoil candidate lectins across 4497 species. TrefLec was searched for original domain combinations, which led to single out SaroL-1 in the choanoflagellate Salpingoeca rosetta, that contains both β-trefoil and aerolysin-like pore-forming domains. Recombinant SaroL-1 is shown to bind galactose and derivatives, with a stronger affinity for cancer-related α-galactosylated epitopes such as the glycosphingolipid Gb3, when embedded in giant unilamellar vesicles or cell membranes. Crystal structures of complexes with Gb3 trisaccharide and GalNAc provided the basis for building a model of the oligomeric pore. Finally, recognition of the αGal epitope on glycolipids required for hemolysis of rabbit erythrocytes suggests that toxicity on cancer cells is achieved through carbohydrate-dependent pore-formation.


F o r M a n u s c r i p t R e v i e w
Page 2 Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 1 Overall quality at a glance i ○ The following experimental techniques were used to determine the structure:

X-RAY DIFFRACTION
The reported resolution of this entry is 1.84 Å.
Percentile scores (ranging between 0-100) for global validation metrics of the entry are shown in the following graphic. The table shows the number of entries on which the scores are based. The table below summarises the geometric issues observed across the polymeric chains and their fit to the electron density. The red, orange, yellow and green segments of the lower bar indicate the fraction of residues that contain outliers for >=3, 2, 1 and 0 types of geometric quality criteria respectively. A grey segment represents the fraction of residues that are not modelled. The numeric value for each fraction is indicated below the corresponding segment, with a dot representing fractions <=5% The upper red bar (where present) indicates the fraction of residues that have poor fit to the electron density. The numeric value is given above the bar.

Metric
Mol Chain Length Quality of chain  Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 2 Entry composition i ○ There are 4 unique types of molecules in this entry. The entry contains 10902 atoms, of which 5083 are hydrogens and 0 are deuteriums.
In the tables below, the ZeroOcc column contains the number of atoms modelled with zero occupancy, the AltConf column contains the number of residues with at least one atom in alternate conformation and the Trace column contains the number of residues modelled with at most 2 atoms.

Mol Chain Residues
3 Residue-property plots i ○ These plots are drawn for all protein, RNA, DNA and oligosaccharide chains in the entry. The first graphic for a chain summarises the proportions of the various outlier classes displayed in the second graphic. The second graphic shows the sequence view annotated by issues in geometry and electron density. Residues are color-coded according to the number of geometric quality criteria for which they contain at least one outlier: green = 0, yellow = 1, orange = 2 and red = 3 or more. A red dot above a residue indicates a poor fit to the electron density (RSRZ > 2). Stretches of 2 or more consecutive residues without any outlier are shown as a green connector. Residues present in the sample, but not in the model, are shown in grey.
• Molecule 1: Sarol-1 Chain A:  Xtriage's analysis on translational NCS is as follows: The analyses of the Patterson function reveals a significant off-origin peak that is 42.17 % of the origin peak, indicating pseudo-translational symmetry. The chance of finding a peak of this or larger height randomly in a structure without pseudo-translational symmetry is equal to 2.1221e-04. The detected translational NCS is most likely also responsible for the elevated intensity ratio.

F o r M a n u s c r i p t R e v i e w
Page 9 Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 5 Model quality i ○

Standard geometry i ○
Bond lengths and bond angles in the following residue types are not validated in this section: GAL, GLA, GLC, EDO The Z score for a bond length (or angle) is the number of standard deviations the observed value is removed from the expected value. A bond length (or angle) with |Z| > 5 is considered an outlier worth inspection. RMSZ is the root-mean-square of all Z scores of the bond lengths (or angles).   The all-atom clashscore is defined as the number of clashes found per 1000 atoms (including hydrogen atoms). The all-atom clashscore for this structure is 2.

Mol Chain
All (23)   Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 metry operator and encoded unit-cell translations to be applied. In the following table, the Percentiles column shows the percent Ramachandran outliers of the chain as a percentile score with respect to all X-ray entries followed by that with respect to entries of similar resolution.
The Analysed column shows the number of residues for which the backbone conformation was analysed, and the total number of residues. In the following table, the Percentiles column shows the percent sidechain outliers of the chain as a percentile score with respect to all X-ray entries followed by that with respect to entries of similar resolution.

Mol Chain
The Analysed column shows the number of residues for which the sidechain conformation was analysed, and the total number of residues. 5.4 Non-standard residues in protein, DNA, RNA chains i ○ There are no non-standard protein/DNA/RNA residues in this entry.

Carbohydrates i ○
12 monosaccharides are modelled in this entry.
In the following table, the Counts columns list the number of bonds (or angles) for which Mogul statistics could be retrieved, the number of bonds (or angles) that are observed in the model and the number of bonds (or angles) that are defined in the Chemical Component Dictionary. The Link column lists molecule types, if any, to which the group is linked. The Z score for a bond length (or angle) is the number of standard deviations the observed value is removed from the expected value. A bond length (or angle) with |Z| > 2 is considered an outlier worth inspection. RMSZ is the root-mean-square of all Z scores of the bond lengths (or angles).     There are no chirality outliers.

Mol Type Chain Res Link
All (9) torsion outliers are listed below:

Mol Chain Res Type
There are no ring outliers.
1 monomer is involved in 1 short contact: F o r M a n u s c r i p t

R e v i e w
Page 16 Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 Mol Chain Res Type Clashes Symm-Clashes 2 F 1 GLC 1 0 The following is a two-dimensional graphical depiction of Mogul quality analysis of bond lengths, bond angles, torsion angles, and ring geometry for oligosaccharide.
Oligosaccharide Chain C There are no bond angle outliers.
There are no chirality outliers.
There are no torsion outliers.
There are no ring outliers.
No monomer is involved in short contacts.

Other polymers i ○
There are no such residues in this entry.

Polymer linkage issues i ○
There are no chain breaks in this entry.

F o r M a n u s c r i p t R e v i e w
Page 20 Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 6 Fit of model and data i ○ 6.1 Protein, DNA and RNA chains i ○ In the following table, the column labelled '#RSRZ> 2' contains the number (and percentage) of RSRZ outliers, followed by percent RSRZ outliers for the chain as percentile scores relative to all X-ray entries and entries of similar resolution. The OWAB column contains the minimum, median, 95 th percentile and maximum values of the occupancy-weighted average B-factor per residue. The column labelled 'Q< 0.9' lists the number of (and percentage) of residues with an average occupancy less than 0.9.   The following is a graphical depiction of the model fit to experimental electron density for oligosaccharide. Each fit is shown from different orientation to approximate a three-dimensional view.

Mol Chain
Electron density around Chain C: Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 Electron density around Chain D: 2mF o -DF c (at 0.7 rmsd) in gray mF o -DF c (at 3 rmsd) in purple (negative) and green (positive) Electron density around Chain E: Full wwPDB X-ray Structure Validation Report (*For Manuscript Review*) 7R55 Electron density around Chain F: 2mF o -DF c (at 0.7 rmsd) in gray mF o -DF c (at 3 rmsd) in purple (negative) and green (positive)

Ligands i ○
In the following table, the Atoms column lists the number of modelled atoms in the group and the number defined in the chemical component dictionary. The B-factors column lists the minimum, median, 95 th percentile and maximum values of B factors of atoms in the group. The column labelled 'Q< 0.9' lists the number of atoms with occupancy less than 0.9.

Other polymers i ○
There are no such residues in this entry.